Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
نویسندگان
چکیده
The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of “spectrally critical” offtree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.
منابع مشابه
Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning
While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method [11] focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization me...
متن کاملTowards Spectral Sparsification of Simplicial Complexes Based on Generalized Effective Resistance
As a generalization of the use of graphs to describe pairwise interactions, simplicial complexes can be used to model higher-order interactions between three or more objects in complex systems. There has been a recent surge in activity for the development of data analysis methods applicable to simplicial complexes, including techniques based on computational topology, higher-order random proces...
متن کاملA Note on Spectrum Preserving Additive Maps on C*-Algebras
Mathieu and Ruddy proved that if be a unital spectral isometry from a unital C*-algebra Aonto a unital type I C*-algebra B whose primitive ideal space is Hausdorff and totallydisconnected, then is Jordan isomorphism. The aim of this note is to show that if be asurjective spectrum preserving additive map, then is a Jordan isomorphism without the extraassumption totally disconnected.
متن کاملA Novel Algorithm of Sparse Representations for Speech Compression/Enhancement and Its Application in Speaker Recognition System
This paper proposes sparse and redundancy representation spectral domain compression of the speech signal using novel sparsing algorithms to the problem of speech compression (SC)/enhancement (SE). In Automatic Speaker Recognition (ASR) sparsification can play a major role to resolve big data issues in speech compression and its storage in the database, where the speech signal can be uncompress...
متن کاملFast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.04584 شماره
صفحات -
تاریخ انتشار 2017